Skip to content

Conversation

@Liyixin95
Copy link
Contributor

reading a 1.9GB ipc stream file.

before:

________________________________________________________
Executed in    3.05 secs    fish           external
   usr time    2.01 secs  276.00 micros    2.01 secs
   sys time    1.04 secs  443.00 micros    1.04 secs

peek memory: 3980.86 MB

after:

________________________________________________________
Executed in    2.22 secs    fish           external
   usr time    1.48 secs  265.00 micros    1.48 secs
   sys time    0.74 secs  439.00 micros    0.74 secs

peek memory: 2086.41 MB

@github-actions github-actions bot added performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars labels Sep 30, 2025
@codecov
Copy link

codecov bot commented Sep 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.90%. Comparing base (b1623ff) to head (a68be9a).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #24671      +/-   ##
==========================================
- Coverage   81.92%   81.90%   -0.03%     
==========================================
  Files        1707     1707              
  Lines      235483   235453      -30     
  Branches     3000     3000              
==========================================
- Hits       192915   192836      -79     
- Misses      41801    41850      +49     
  Partials      767      767              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

scratch,
);

let new_pos = reader.stream_position()?;
Copy link
Member

@orlp orlp Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the reborrow of reader explicit here? That is, pass &mut (&mut *reader).take(block_length as u64) instead? Right now this only works because reader.take automatically gets transformed to (&mut *reader).take. I was really confused how this compiled since take takes self and should consume the &mut R.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have change this to &mut (&mut *reader).take(block_length as u64). But I still think it's a little verbose...

.unwrap_or_else(VecDeque::new);
let mut buffers: VecDeque<arrow_format::ipc::BufferRef> = buffers.iter().collect();

// check that the sum of the sizes of all buffers is <= than the size of the file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this check was removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't easily get file_size after removing the intermediate vec. But remove this check will indeed cause wrong result when file_size and buffer_size not match rather then report an error. Maybe we can change all the subsequent read_to_end to read_excat?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't easily get file_size after removing the intermediate vec. But remove this check will indeed cause wrong result when file_size and buffer_size not match rather then report an error. Maybe we can change all the subsequent read_to_end to read_excat?

@orlp kindly pin in case you forget this thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Liyixin95 I think that should be the way forward then yes. I do want the length to be checked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orlp now all subsequent read_to_end has been change to read_excat. read_excat require vec initial overhead, but I could try to eliminate this using read_buf_excat in another pr, if unstable api is acceptable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Liyixin95 Oh... I hadn't considered that it would add overhead. Could you change back to reserve + take + read_to_end and do a manual assert afterwards that checks the length?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orlp done

@ritchie46 ritchie46 force-pushed the main branch 3 times, most recently from 90ceb7b to e9fce55 Compare October 26, 2025 16:01
@orlp
Copy link
Member

orlp commented Oct 27, 2025

Thanks, sorry it took a while to review :)

@orlp orlp merged commit 48840c6 into pola-rs:main Oct 27, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants